Semantics Altering Modifications for Evaluating Comprehension in Machine Reading
نویسندگان
چکیده
Advances in NLP have yielded impressive results for the task of machine reading comprehension (MRC), with approaches having been reported to achieve performance comparable that humans. In this paper, we investigate whether state-of-the-art MRC models are able correctly process Semantics Altering Modifications (SAM): linguistically-motivated phenomena alter semantics a sentence while preserving most its lexical surface form. We present method automatically generate and align challenge sets featuring original altered examples. further propose novel evaluation methodology assess capability systems these examples independent data they were optimised on, by discounting effects introduced domain shift. large-scale empirical study, apply order evaluate extractive regard their SAM-enriched data. comprehensively cover 12 different neural architecture configurations four training datasets find -- despite well-known remarkable consistently struggle semantically
منابع مشابه
Evaluating Machine Reading Systems through Comprehension Tests
This paper describes a methodology for testing and evaluating the performance of Machine Reading systems through Question Answering and Reading Comprehension Tests. The methodology is being used in QA4MRE (QA for Machine Reading Evaluation), one of the labs of CLEF. We report here the conclusions and lessons learned after the first campaign in 2011.
متن کاملAdversarial Examples for Evaluating Reading Comprehension Systems
Standard accuracy metrics indicate that reading comprehension systems are making rapid progress, but the extent to which these systems truly understand language remains unclear. To reward systems with real language understanding abilities, we propose an adversarial evaluation scheme for the Stanford Question Answering Dataset (SQuAD). Our method tests whether systems can answer questions about ...
متن کاملStochastic Answer Networks for Machine Reading Comprehension
We propose a simple yet robust stochastic answer network (SAN) that simulates multistep reasoning in machine reading comprehension. Compared to previous work such as ReasoNet, the unique feature is the use of a kind of stochastic prediction dropout on the answer module (final layer) of the neural network during the training. We show that this simple trick improves robustness and achieves result...
متن کاملMachine Comprehension with Syntax, Frames, and Semantics
We demonstrate significant improvement on the MCTest question answering task (Richardson et al., 2013) by augmenting baseline features with features based on syntax, frame semantics, coreference, and word embeddings, and combining them in a max-margin learning framework. We achieve the best results we are aware of on this dataset, outperforming concurrentlypublished results. These results demon...
متن کاملEvaluating the Meaning of Answers to Reading Comprehension Questions: A Semantics-Based Approach
There is a rise in interest in the evaluation of meaning in real-life applications, e.g., for assessing the content of short answers. The approaches typically use a combination of shallow and deep representations, but little use is made of the semantic formalisms created by theoretical linguists to represent meaning. In this paper, we explore the use of the underspecified semantic formalism LRS...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i15.17622